General Database Statistics Using Entropy Maximization

نویسندگان

  • Raghav Kaushik
  • Christopher Ré
  • Dan Suciu
چکیده

We propose a framework in which query sizes can be estimated from arbitrary statistical assertions on the data. In its most general form, a statistical assertion states that the size of the output of a conjunctive query over the data is a given number. A very simple example is a histogram, which makes assertions about the sizes of the output of several range queries. Our model also allows much more complex assertions that include joins and projections. To model such complex statistical assertions we propose to use the Entropy-Maximization (EM) probability distribution. In this model any set of statistics that is consistent has a precise semantics, and every query has an precise size estimate. We show that several classes of statistics can be solved in closed form.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Mechanics of Classical N-Particle System of Galaxies in the Expanding Universe

For the distribution of classical non-interacting particles we use MaxwellBoltzmann’s statistics. However, this statistics is not workable for classical interacting particles (galaxies). We attempt to modify the Maxwell-Boltzmann’s statistics by incorporating gravitational interaction term in it. The number of ways in which N-particles can have pair interaction due to gravitational interaction ...

متن کامل

Probabilistic Query Answering Using Views

The paper studies two probabilistic query evaluation problems. The general setting is that we are given a probability distribution on all possible database instances and have to compute the probability of a tuple belonging to the query’s answer. In the deterministic view problem, we are given a set of view instances and are asked to determine the probability of a tuple belonging to a query’s an...

متن کامل

Derivation of equilibrium and time - dependent solutions to MIMI 001 IN and MIMI 00 queueing systems using entropy . maximization

Queueing theory has provided the basis for remarkable successes in the performance modeling and analysis of computer systems.6,19,21 Because it is clear that computer systems do not satisfy assumptions made by the stochastic process models that are used, this success has been somewhat puzzling; it appears that queueing theory equations have wider applicability than is suggested by their classic...

متن کامل

A Novel Content Based Image Retrieval Model Based on the Most Relevant Features Using Particle Swarm Optimization

Content Based Image Retrieval (CBIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval (CBIR) depends on extracting the most relevant features according to a feature selection technique. The integration of multiple features may cause the curse of dimensionality a...

متن کامل

Measure Selection: Notions of Rationality and Representation Independence

We take another look at the general problem of selecting a preferred probability measure among those that comply with some given constraints. The dominant role that entropy maximization has obtained in this context is questioned by argu­ ing that the minimum information principle on which it is based could be supplanted by an at least as plausible "likelihood of evidence" prin­ ciple. We then r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009